Dynamic adaptation of device interfaces in a voice-based system

ABSTRACT

Implementations relate to dynamic adaptation of images for projection by a projector, based on one or more properties of user(s) that are in an environment with the projector. The projector can be associated with an automated assistant client of a client device. In some versions of those implementations, a pose of a user in the environment is determined and, based on the pose, a base image for projecting onto a surface is warped to generate a transformed image. The transformed image, when projected onto a surface and viewed from the pose of the user, mitigates perceived differences relative to the base image.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims the benefit of priority under 35 U.S.C. §120 as a continuation of U.S. patent application Ser. No. 15/973,456,titled “DYNAMIC ADAPTATION OF DEVICE INTERFACES IN A VOICE-BASEDSYSTEM,” filed May 7, 2018, which claims the benefit of priority under35 U.S.C. § 120 as a continuation-in-part of U.S. patent applicationSer. No. 15/955,297, titled “DYNAMIC ADAPTATION OF IMAGES FORPROJECTION, AND/OR OF PROJECTION PARAMETERS, BASED ON USER(S) INENVIRONMENT,” filed Apr. 17, 2018, each of which is incorporated byreference herein in its entirety.

BACKGROUND

An automated assistant (also known as a “personal assistant”, “mobileassistant”, etc.) can be interacted with by a user via a variety ofclient devices such as projectors, smart phones, tablet computers,wearable devices, automobile systems, and/or standalone personalassistant devices. An automated assistant receives input from the usersuch as typed input, touch input, and/or spoken natural language input.The automated assistant can respond with responsive content such asvisual and/or audible natural language output. An automated assistantinteracted with a client device can be implemented via the client deviceitself and/or one or more remote computing devices, such as (but notlimited to) computing device(s) in “the cloud”, that are connected tothe client device via a network.

SUMMARY

This disclosure relates to systems, methods, and apparatus for dynamicadaptation of images for projection by a projector, and/or of projectionparameters, based on one or more properties of user(s) that are in anenvironment with the projector. Typically, as a user moves within aroom, an image projected onto the same position of a wall will appear,to the user, to change due to the perspective change of the user basedon the user's movement within the room. In other words, in such atypical situation the image projected onto the wall will remain the sameas the user moves within the room—but the projected image, as perceivedby the user, will appear to change as the user moves within the room. Inimplementations disclosed herein, a base image can be transformed, independence on a pose of a user, to generate a transformed image. As usedherein, a “pose” references a position of a user, and optionally also anorientation of the user. The transformed image is different from thebase image, but is generated such that when projected it appears to besubstantially similar to the base image, when viewed from the pose ofthe user. Stated differently, if viewed from the same pose, theprojected base image and the projected transformed image would beperceivable as different by the user. However, if the projected baseimage were viewed by the user from a first pose (e.g., “straight on”)and the projected transformed image were viewed by the user from asecond pose (e.g., at a 70° angle relative to “straight on”), they wouldbe perceived as the same.

Accordingly, various implementations disclosed herein can selectivelyand dynamically transform base images, in dependence on a pose of auser. Through the selective and dynamic transformation of base images,transformed images can selectively be projected in lieu of their baseimage counterparts, such that projected images, when viewed by the user,appear to be substantially similar to their base image counterparts. Itwill be understood that some user poses will require no dynamictransformations, and the base image itself can be projected to the user.As used herein, a “base image” references a single image frame andoptionally also an image that is part of a sequence of images that forma video or other dynamic sequence of images.

In many implementations, an automated assistant can identify activeuser(s) of the automated assistant in determining how to dynamicallytransform an image. For example, where multiple users are present, theautomated assistant can identify a subset of those users as activeusers, determine at least one pose of the subset, and transform an imagein dependence on the at least one pose of the subset. Active users canbe identified by an automated assistant in a number of ways including bymovement, location, pose, facial identification, voice identification,and/or gaze. In some implementations, rooms can contain more than oneperson and various numbers of active users. As an illustrative example,a room can contain one person and no active users, one person and oneactive user, several people and one active user, and/or several peopleand several active users. The number of active users in a room canchange over time, and a redetermination of active users by the automatedassistant can be used to determine new image transformation parametersto use in transforming base images for projection.

In a variety of implementations, images can be dynamically transformed(or “warped”) by the automated assistant so the image appears the sameas an active user moves within a room. Image warping can be a lineartransformation, and can include a variety of processes includingrotating the image, scaling the image, and skew adjusting the image. Asan illustrative example of image warping, assume a base image thatincludes a pair of parallel lines. If the base image is projected, thelines would appear parallel to a user that is viewing the projectionfrom a pose that is perpendicular to (e.g., directly in front of) asurface on which the projection is provided. However, if the user wereinstead viewing the projection of the base image from anon-perpendicular angle (e.g., from the side), the lines would appearnon-parallel. Generating a transformed image based on warping the baseimage, and projecting the transformed image in lieu of the base image,can lead to the user still perceiving the lines as parallel even whenthe user is at a non-perpendicular angle (e.g., from the side). In otherwords, the user's perception of the projection of the transformed imagecan be more similar to the base image than would be the user'sperception of a projection of the base image itself.

In some additional or alternative implementations, a base image can begenerated and/or identified based on a distance of the user, where thedistance of the user is indicative of the distance between the user andthe surface upon which the image is projected (e.g., the distance can bebased on a distance between the user and the projector, and optionallythe distance from the projector to the surface). For example,interactive user interface (UI) elements can be included or excluded ina base image depending on the distance of the user from the projectedimage. For instance, when a user is relatively far away from aprojection surface (e.g., more than 5 feet away or other “unreachable”distance), a base image can be identified or generated that lacks anyinteractive UI elements. In contrast, when a user is relatively close toa projection surface (e.g., within “reach” of the projection surface), abase image can be identified or generated that includes interactive UIelements. As another example, a first base image can be identified orgenerated when a user is within a first range of distances of theprojection surface, and a second base image can be identified orgenerated when the user is instead within a second range of distances ofthe projection surface. For instance, in response to a user's requestfor “weather”, either the first base image or the second base image canbe identified and/or generated for projection, in dependence on adistance of the user. For instance, the first range of distances caninclude farther distances and the first base image can include lessinformation such as only today's weather report. In contrast, the secondrange of distances can include closer distances and the second baseimage can include more information such as today's weather report andthe weather report for one or more additional days. In someimplementations, the projected image can be touch sensitive, giving auser close enough to touch the projection a modality via which tointeract with the automated assistant (e.g., in addition to voice and/orgesture modalities).

Implementations disclosed herein can enhance the usability of anautomated assistant through dynamic adaptation of what content isprojected and/or how it is projected. Such dynamic adaptations canenable more accurate and/or more comprehendible representations ofprojected content from a variety of viewing poses. Such dynamicadaptations can be of benefit to, for example, users with low dexteritythat may be constrained with respect to the poses from which they canview projected automated assistant content. In some additional oralternative implementations, generating and projecting a transformedimage can reduce the duration of time that it is necessary for aprojector to project the transformed image, thereby conserving powerresources that would otherwise be required to project the transformedimage for a longer duration. For example, when a user views a projectedtransformed image, the user can comprehend the information presented inthe image more quickly (i.e., relative to if a base image were insteadprojected) and/or is less likely to need to move to understand theinformation in the projected image. This enables the projector to ceaseprojecting of the transformed image more quickly. The projector cancease projecting of the transformed image, for example, in response auser command to dismiss the image (e.g., a command that requestadditional content that will supplant the image) or as a time-out afterdetermining the user is no longer viewing the image. In some additionalor alternative implementations, a cloud-based automated assistantcomponent can send a base image and the client device can generatetransformation(s) of the base image locally, obviating the need forfurther client-cloud-based automated assistant component networkcommunications to request and transmit transformation(s). This canefficiently lessen the amount of data exchanged between cloud-basedautomated assistant component and the client since the cloud-basedautomated assistant component only needs to send a single base imageinstead of needing to send multiple image transformations along witheach base image.

The above description is provided as an overview of some implementationsdisclosed herein. Additional description of these and otherimplementations is set forth in more detail herein.

In some implementations, a method is provided and includes identifying,by an automated assistant client of a computing device in anenvironment, a base image for projecting onto a surface via a projectoraccessible to the automated assistant client. The method furtherincludes determining, using sensor data from at least one sensor, afirst pose of a user in the environment. The sensor data is accessibleto the automated assistant client. The method further includesdetermining, using the first pose of the user, first imagetransformation parameters for warping images. The method furtherincludes generating a first transformed image that is a transformationof the base image, and causing the projector to project the transformedimage onto the surface. Generating the first transformed image includesusing the first image transformation parameters to warp the base image.The first transformed image, when projected onto the surface and viewedfrom the first pose of the user, mitigates perceived differencesrelative to the base image. The method further includes determining,using additional sensor data from the at least one sensor, a second poseof the user in the environment, where the second pose of the userindicates the user has moved. The method further includes determining,using the second pose of the user, second image transformationparameters for warping images. The method further includes generating asecond transformed image that is a transformation of the base image orof an additional base image, and causing the projector to project thesecond transformed image onto the surface. Generating the secondtransformed image includes using the second transformation parameters towarp the base image or the additional base image. The second transformedimage, when projected onto the surface and viewed from the second poseof the user, mitigates perceived differences relative to the base imageor the additional base image.

These and other implementations of the technology disclosed herein caninclude one or more of the following features.

In some implementations, the base image is received, via a networkinterface of the computing device, from a remote automated assistantcomponent that interfaces with the automated assistant client.

In some implementations, the base image is generated by the automatedassistant client based on data received, via a network interface of thecomputing device, from a remote automated assistant component thatinterfaces with the automated assistant client.

In some implementations, the method further includes determining adistance of the surface to the projector using second sensor data from asecond sensor. The second sensor data is accessible to the automatedassistant client. In some versions of those implementations, determiningthe first image transformation parameters for warping images includesdetermining the first image transformation parameters using the firstpose of the user and using the distance of the surface to the projector.In some additional or alternative version of those implementations,generating the second transformed image includes using the second poseof the user and the distance from the surface to the projector to warpthe base image.

In some implementations, generating the first transformed image that isthe transformation of the base image includes performing at least onelinear transformation on the base image. In some of thoseimplementations, the at least one linear transformation is selected froma group consisting of rotation of the base image, scaling of the baseimage, and skew adjustment of the base image.

In some implementations, the base image has first dimensions and thefirst transformed image has the same first dimensions. In some of thoseimplementations, the base image includes base image pixels each havingcorresponding values assigned thereto, and the transformed imageincludes transformed image pixels. The transformed image pixels have thesame corresponding values as the base image pixels, but the assignmentof the same corresponding values to the transformed image pixels differsfrom the assignment of the corresponding values to the base image pixelsin the base image. For example, a given transformed image pixel, havinga given X and Y position in the transformed image, can have the samevalues as a given base image pixel, of the base image, where the givenbase image pixel that has a different X and Y position in the baseimage.

In some implementations, the method further includes determining, basedon the first pose of the user, a desired size for the projection of thefirst transformed image. In some of those implementations, causing theprojector to project the first transformed image onto the surfaceincludes causing the projector to project the first transformed image toachieve the desired size for the projection.

In some implementations, identifying the base image includes selectingthe base image, from a plurality of candidate base images, based on thefirst pose of the user. In some of those implementations, selecting thebase image based on the first pose of the user includes: determining adistance of the user based on the first pose of the; and selecting thebase image based on the distance corresponding to a distance measureassigned to the base image. The distance can be, for example, relativeto the projector or relative to the surface.

In some implementations, the method further includes generating, by theautomated assistant client, the base image based on the pose of theuser. In some of those implementations, generating the base image basedon the pose of the user includes: determining the pose of the user iswithin a threshold distance of the surface; and based on determining thepose of the user is within the threshold distance of the surface:generating the base image to include one or more interactive interfaceelements.

In some implementations, a method if provided and includes identifying,by an automated assistant client of a computing device, a base image forprojecting onto a surface via a projector accessible to the automatedassistant client. The method further includes identifying a plurality ofusers that are in an environment with the computing device, anddetermining, using sensor data from at least one sensor accessible tothe computing device, that a subset of the users are active users forthe automated assistant client. The method further includes determiningat least one pose for the subset of the users that are determined to beactive users. Determining the at least one pose is based on the sensordata or additional sensor data from at least one additional sensoraccessible to the computing device. The method further includes, basedon determining that the subset of the user are active users, using theat least one pose for the subset of the users in generating atransformed image of the base image. The method further includes causingthe projector to project the transformed image onto the surface.

These and other implementations of the technology disclosed herein caninclude one or more of the following features.

In some implementations, the method further includes determining, usingadditional sensor data from the at least one sensor, that a secondsubset of the users are active users for the automated assistant client.The additional sensor data is generated at a time subsequent to thesensor data, and the second subset of the users varies from the firstsubset of the users. In those implementations, the method furtherincludes: determining, based on the additional sensor data, at least onesecond pose for the second subset of the users that are determined to beactive user; and based on determining that the second subset of the userare active users, using the at least one second pose for the secondsubset of the users in generating a second transformed image of the baseimage, or of an additional image, using the at least one second pose. Inthose implementations, the method further includes causing the projectorto project the second transformed image onto the surface.

In some implementations, the method further includes: determining, basedon the sensor data or additional sensor data, a gaze for the subset ofthe users that are determined to be active users; and generating thetransformed image using the gaze of the one or more active users.

In some implementations, the method further includes: determining, basedon the pose of the subset of the users, a desired size for theprojection of the transformed image. In some of those implementations,causing the projector to project the transformed image onto the surfaceincludes causing the projector to project the first transformed image toachieve the desired size for the projection.

In some implementations, a method is provided that includes identifying,by an automated assistant client of a computing device, a base image forprojecting via a projector accessible to the automated assistant client.The method further includes identifying a plurality of users that are inan environment with the computing device. The method further includesdetermining, using sensor data from at least one sensor accessible tothe computing device, that a subset of the users are active users forthe automated assistant client. The method further includes determiningat least one pose for the subset of the users that are determined to beactive users. Determining the at least one pose is based on the sensordata or additional sensor data from at least one additional sensoraccessible to the computing device. The method further includes, basedon determining that the subset of the user are active users, using theat least one pose for the subset of the users in determining one or moreprojection parameters for a projection that includes the base image, ora transformed image that is a transform of the base image. The methodfurther includes causing the projector to project the projection usingthe projector.

These and other implementations of the technology disclosed herein caninclude one or more of the following features.

In some implementations, the one or more projection parameters includeone or multiple of: a size of the base image or the transformed image inthe projection, a size of the projection, a location of the base imageor the transformed image in the projection, and a location of theprojection.

According to at least one aspect of the disclosure, a system to generateinterfaces in an audio based networked system can include a computingdevice that can include one or more processors and a memory The one ormore processors can be configured to execute a natural languageprocessor, a content selector component, and a transformation component.The computing device can receive an input audio signal that is detectedby a sensor at a client device. The computing device can parse inputaudio signal to identify a first request in the input audio signal and akeyword associated with the first request. The computing devicecomputing device can select a first base digital component based on atleast the first digital component request. The computing device canselect a second base digital component based on at least the keywordassociated with the first digital component request. The computingdevice can determine a distance between the client device and aprojection surface. The computing device can determine, based on thedistance between the client device and the projection surface,transformation parameters for the first base digital component and thesecond base digital component. The transformation parameters can beconfigured to correct a skew of images projected onto the projectionsurface. The computing device can generate a first transformed imagebased at least on the transformation parameters and the first basedigital component and a second transformed image based at least on thetransformation parameters and the second digital component. Thecomputing device can transmit the first transformed image and the secondtransformed image to the client device for projection onto theprojection surface.

According to at least one aspect of the disclosure, a method to generateinterfaces in an audio-based networked system can include receiving, bya natural language processor executed by one or more processors of acomputing device, an input audio signal detected by a sensor at a clientdevice. The method can include parsing, by the natural languageprocessor, the input audio signal to identify a first request in theinput audio signal and a keyword associated with the first request. Themethod can include selecting, by a content selector component of thecomputing device, a first base digital component based on at least thefirst digital component request. The method can include selecting, bythe content selector component, a second base digital component based onat least the keyword associated with the first digital componentrequest. The method can include determining, by a transformationcomponent executed by the one or more processors of the computing deviceand based on sensor data from the client device, a distance between theclient device and a projection surface. The method can includedetermining, by the transformation component, based on the distancebetween the client device and the projection surface, transformationparameters for the first base digital component and the second basedigital component. The transformation parameters configured to correct askew of images projected onto the projection surface. The method caninclude generating, by the transformation component, a first transformedimage based at least on the transformation parameters and the first basedigital component and a second transformed image based at least on thetransformation parameters and the second digital component. The methodcan include transmitting, by the transformation component, the firsttransformed image and the second transformed image to the client devicefor projection onto the projection surface.

Other implementations may include a non-transitory computer readablestorage medium storing instructions executable by one or more processors(e.g., central processing unit(s) (CPU(s)), graphics processing unit(s)(GPU(s)), and/or tensor processing unit(s) (TPU(s)) to perform a methodsuch as one or more of the methods described above and/or elsewhereherein. Yet other implementations may include a system of one or morecomputers and/or one or more robots that include one or more processorsoperable to execute stored instructions to perform a method such as oneor more of the methods described above and/or elsewhere herein.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating an example environment in whichvarious implementations can be implemented.

FIG. 2A is a diagram illustrating an example scene of a user in a firstpose in a room looking at an image projected onto a wall by a projectorusing an automated assistant.

FIG. 2B is a diagram illustrating an example scene of a user in a secondpose in a room looking at an image projected at the same location ontothe wall by a projector using an automated assistant.

FIG. 2C is a diagram illustrating an example of a projected image, asperceived by a user from a perspective that is directly perpendicular tothe projected image.

FIG. 2D illustrates an example of the projected image of FIG. 2C, but asperceived by a user from a perspective that is not directlyperpendicular to the projected image.

FIG. 3 is a flowchart illustrating an example process according toimplementations disclosed herein.

FIG. 4 is a flowchart illustrating an example process according toimplementations disclosed herein.

FIGS. 5A, 5B, 5C, and 5D are diagrams illustrating examples of imageprocessing adjustments.

FIG. 6 is a flowchart illustrating an example process according toimplementations disclosed herein.

FIG. 7 is a flowchart illustrating an example process according toimplementations disclosed herein.

FIG. 8A is a diagram illustrating an example scene of a user in a firstpose in a room looking at an image projected onto a wall by a projectorusing an automated assistant.

FIG. 8B is a diagram illustrating an example scene of a user in a secondpose in a room looking at an image projected at the same location ontothe wall by a projector.

FIG. 9 is a flowchart illustrating an example process according toimplementations disclosed herein.

FIG. 10 illustrates a block diagram of an example method to generateinterfaces in an audio-based, networked system according toimplementations disclosed herein.

FIG. 11 is a block diagram illustrating an example architecture of acomputing device.

DETAILED DESCRIPTION

FIG. 1 illustrates an example environment 100 in which variousimplementations can be implemented. The example environment 100 includesone or more client devices 102. Each client device 102 may execute arespective instance of an automated assistant client 112. One or morecloud-based automated assistant components 116, such as natural languageprocessor 122 and digital component selector 126, may be implemented onone or more computing systems (collectively referred to as a “cloud”computing system) that are communicatively coupled with client devices102 via one or more local and/or wide area networks 114 (e.g., theinternet). The system 100 can include one or more digital componentproviders 128 that can provide digital components to the client device102 via the cloud-based automated assistant components 116 and networks114.

The system 100 can include one or more digital component providers 128.The digital component providers 128 can provide audio, visual, ormultimedia based digital components (which can also be referred to ascontent, images, or base images) for presentation by the client device102 or the projector 106 as an audio and visual based output digitalcomponent. The digital component can be or include other digitalcomponents. The digital component can be or include a digital object.The digital component can be configured for a parametrically driven textto speech technique. The digital component can be configured fortext-to-speech (TTS) implementations that convert normal language textinto speech. For example, the digital component can include an imagethat is displayed on a projection surface as, via TTS, text related tothe displayed image is presented to the user. The digital component canbe input to an application programming interface that utilizes aspeech-synthesis capability to synthesize text into natural-soundingspeech in a variety of languages, accents, and voices. The digitalcomponent can be coded as plain text or a speech synthesis markuplanguage (SSML). SSML can include parameters that can be set to controlaspects of speech, such as pronunciation, volume, pitch, or rate thatcan form an acoustic fingerprint or native voice.

The digital component provider 128 can provide selection criteria forthe digital component, such as a value, keyword, concept, or othermetadata or information to facilitate a content selection process. Thedigital component provider 128 can provide video based digitalcomponents (or other digital components) to the content selectorcomponent 126 where they can be stored in a data repository. The contentselector component 126 can select the digital components from the datarepository and provide the selected digital components to the clientdevice 102.

The digital component provider 128 can provide the digital component tothe content selector component 126 for storage in the data repository ina content data structure. The content selector component 126 canretrieve the digital component responsive to a request for content fromthe client device 102 or otherwise determining to provide the digitalcomponent.

The digital component provider 128 can establish a digital componentcampaign (or electronic content campaign). A digital component campaigncan refer to one or more content groups that correspond to a commontheme. A content campaign can include a hierarchical data structure thatincludes content groups, digital component data objects (e.g., digitalcomponents or digital objects), and content selection criteria. Tocreate a digital component campaign, digital component provider 128 canspecify values for campaign level parameters of the digital componentcampaign. The campaign level parameters can include, for example, acampaign name, a preferred content network for placing digital componentobjects, a value of resources to be used for the digital componentcampaign, start and end dates for the content campaign, a duration forthe digital component campaign, a schedule for digital component objectplacements, language, geographical locations, type of computing deviceson which to provide digital component objects. In some cases, animpression can refer to when a digital component object is fetched fromits source and is countable. Due to the possibility of click fraud,robotic activity can be filtered and excluded, as an impression. Thus,an impression can refer to a measurement of responses from a Web serverto a page request from a browser, which is filtered from roboticactivity and error codes, and is recorded at a point as close aspossible to opportunity to render the digital component object fordisplay on the computing device 104. In some cases, an impression canrefer to a viewable or audible impression; e.g., the digital componentobject or digital component is at least partially (e.g., 20%, 30%, 30%,40%, 50%, 60%, 70%, or more) viewable on a display device of the clientdevice 102, or audible via a speaker of the client device 102. A clickor selection can refer to a user interaction with the digital componentobject, such as a voice response to an audible impression, amouse-click, touch interaction, gesture, shake, audio interaction, orkeyboard click. A conversion can refer to a user taking a desired actionwith respect to the digital component objection; e.g., purchasing aproduct or service, completing a survey, visiting a physical storecorresponding to the digital component, or completing an electronictransaction.

The digital component provider 128 can establish one or more contentgroups for a digital component campaign. A content group includes one ormore digital component objects and corresponding content selectioncriteria, such as keywords, words, terms, phrases, geographic locations,type of computing device, time of day, interest, topic, or vertical.Content groups under the same content campaign can share the samecampaign level parameters, but may have tailored specifications forcontent group level parameters, such as keywords, negative keywords(e.g., that block placement of the digital component in the presence ofthe negative keyword on main content), or parameters associated with thecontent campaign.

To create a new content group, the digital component provider 128 canprovide values for the content group level parameters of the contentgroup. The content group level parameters include, for example, acontent group name or content group theme, and bids for differentcontent placement opportunities (e.g., automatic placement or managedplacement) or outcomes (e.g., clicks, impressions, or conversions). Acontent group name or content group theme can be one or more terms thatthe digital component provider 128 can use to capture a topic or subjectmatter for which digital component objects of the content group is to beselected for display. For example, a food and beverage company cancreate a different content group for each brand of food or beverage itcarries, and may further create a different content group for each modelof vehicle it carries. Examples of the content group themes that thefood and beverage company can use include, for example, “Brand A cola”,“Brand B ginger ale,” “Brand C orange juice,” “Brand D sports drink,” or“Brand E purified water.” An example content campaign theme can be“soda” and include content groups for both “Brand A cola” and “Brand Bginger ale”, for example. The digital component (or digital componentobject or digital component) can include “Brand A”, “Brand B”, “BrandC”, “Brand D” or “Brand E”.

The digital component provider 128 can provide one or more keywords anddigital component objects to each content group. The keywords caninclude terms that are relevant to the product or services of associatedwith or identified by the digital component objects. A keyword caninclude one or more terms or phrases. For example, the food and beveragecompany can include “soda,” “cola,” “soft drink,” as keywords for acontent group or content campaign that can be descriptive of the goodsor services the brand provides. In some cases, negative keywords can bespecified by the content provider to avoid, prevent, block, or disablecontent placement on certain terms or keywords. The content provider canspecify a type of matching, such as exact match, phrase match, or broadmatch, used to select digital component objects.

The digital component provider 128 can provide the one or more keywordsto be used by the content selector component 126 to select a digitalcomponent object provided by the digital component provider 128. Thedigital component provider 128 can provide additional content selectioncriteria to be used by the content selector component 126 to selectdigital component objects. The content selector component 126 can run acontent selection process involving multiple content providers 128responsive to receiving an indication of a keyword of an electronicmessage.

The digital component provider 128 can provide one or more digitalcomponent objects for selection by the content selector component 126.The digital component objects can be a digital component or a collectionof digital components. The content selector component 126 can select thedigital component objects when a content placement opportunity becomesavailable that matches the resource allocation, content schedule,maximum bids, keywords, and other selection criteria specified for thecontent group. Different types of digital component objects can beincluded in a content group, such as a voice digital component, audiodigital component, a text digital component, an image digital component,video digital component, multimedia digital component, or digitalcomponent link. Upon selecting a digital component, the content selectorcomponent 126 can transmit the digital component object for presentationor rendering on a client device 102 or display device of the clientdevice 102. Presenting or rendering can include displaying the digitalcomponent on a display device or playing the digital component via aspeaker of the client device 102. The content selector component 126 topresent or render the digital component object. The content selectorcomponent 126 can instruct the client device 102 to generate audiosignals, acoustic waves, or visual output. For example, the automatedassistant client 108 can present the selected digital component via anaudio output.

The instance of an automated assistant client 108, by way of itsinteractions with one or more cloud-based automated assistant components116, may form what appears to be, from the user's perspective, a logicalinstance of an automated assistant 112 with which the user may engage ina dialogue. One instance of such an automated assistant 112 is depictedin FIG. 1 by a dashed line. It thus should be understood that each userthat engages with an automated assistant client 108 executing on aclient device 102 may, in effect, engage with his or her own logicalinstance of an automated assistant 112. For the sake of brevity andsimplicity, the term “automated assistant” as used herein as “serving” aparticular user may often refer to the combination of an automatedassistant client 108 operated by the user and one or more cloud-basedautomated assistant components 116 (which may be shared amongst multipleautomated assistant clients 108). It should also be understood that insome implementations, automated assistant 112 may respond to a requestfrom any user regardless of whether the user is actually “served” bythat particular instance of automated assistant 112.

Client device 102 may include, for example, one or more of: a desktopcomputing device, a laptop computing device, a tablet computing device,a touch sensitive computing device (e.g., a computing device which canreceive input via touch from a user), a mobile phone computing device, acomputing device of a vehicle of the user (e.g., an in-vehiclecommunications system, an in-vehicle entertainment system, an in-vehiclenavigation system), a standalone interactive speaker, a smart appliancesuch as a smart television, a projector, and/or a wearable apparatus ofthe user that includes a computing device (e.g., a watch of the userhaving a computing device, glasses of the user having a computingdevice, a virtual or augmented reality computing device). Additionaland/or alternative client computing devices may be provided.

The client device 102 can interface with a projector 106 or can includethe projector 106. In some implementations, the projector 106 can be a“smart” projector (e.g., the “smart” projector can either simply displayimages it receives from client device 102 and/or receive relevant datato generate image transformations at the projector before projecting atransformed image). Furthermore, the projector 106 may include, forexample, liquid crystal display (LCD) projectors, digital lightprocessing (DLP) projectors, light emitting diode (LED) projectors,hybrid LED and laser diode projectors, and/or laser diode projectors.The projector 106 can be a short throw or ultra-short throw projector. Aprojected image can be touch sensitive and include a touch interfacewhich can similarly receive touch inputs and/or gestures for allowing auser to control the automated assistant via the touch interface of aprojected image. Projectors displaying touch sensitive images caninclude a variety of infrared sensors, cameras, and/or other sensor(s)to detect a user's gestures and taps to determine how a user isinteracting with the projected image.

The automated assistant client 108 can utilize either the projectorintegrated within client device 102 or a stand-alone projector 106. Inmany implementations, automated assistant client 108 can utilize bothprojectors, for example using a different projector for a differentsituation. For example, automated assistant client 108 can utilize theprojector integrated within client device 102 to project still imagesand stand-alone projector 106 to project a video sequence. The automatedassistant client 108 can use different projectors in different lightingconditions depending on the specifications of the specific projectors,for example stand-alone projector 106 might project better in lowerlighting conditions.

The client device 102 may include one or more presence sensors 104 thatare configured to provide signals indicative of detected presence,particularly human presence. Presence sensors may come in various formsand can collect a variety of types of input to the automated assistant112 such as verbal, textual, graphical, physical (e.g., a touch on adisplay device including a touch sensitive projector and/or a touchsensitive screen of a computing device), and/or visual (e.g., a gesture)based input. Some client devices 102 may be equipped with one or moredigital cameras that are configured to capture and provide signal(s)indicative of movement detected in the fields of view. The clientdevices 102 may be equipped with presence sensors 104 that detectacoustic (or pressure) waves, such as one or more microphones.

The presence sensors 104 may be configured to detect indicationsassociated with human presence. For example, in some implementations, aclient device 102 may be equipped with a presence sensor 104 thatdetects various types of waves (e.g., radio, ultrasonic,electromagnetic, etc.) emitted by, for instance, a mobile client device102 carried/operated by a particular user. For example, some clientdevices 102 may be configured to emit waves that are imperceptible tohumans, such as ultrasonic waves or infrared waves, that may be detectedby other client devices 102 (e.g., via ultrasonic/infrared receives suchas ultrasonic-capable microphones).

The various client devices 102 may emit other types ofhuman-imperceptible waves, such as radio waves (e.g., Wi-Fi, Bluetooth,cellular, etc.) that may be detected by one or more client devices 102and used to determine an operating user's particular position. In someimplementations, Wi-Fi triangulation may be used to detect a user'sposition, e.g., based on Wi-Fi signals to/from a client device 102 forexample, utilizing any of a variety of Wi-Fi SLAM methods. In otherimplementations, other wireless signal characteristics, signal strength,etc., may be used by various client devices 102 alone or collectively,to determine a particular person's pose based on signals emitted by aclient device 102 they carry. Time-of-flight cameras can be usedindependently as presence sensors 104 to locate the pose of user(s) inan environment.

The automated assistant 112 may engage in dialog sessions with one ormore users via user interface input and output devices of one or moreclient devices 102. The dialog sessions can be audio-based, image-based,or a combination of audio and images. In response to the input dialogfrom the user, the one or more client device 102 can present selecteddigital components, such as images, videos, text, or audio to the user.In some implementations, automated assistant 112 may engage in dialogsessions with a user in response to user interface input provided by theuser via one or more user interface input devices of one of the clientdevices 102. In some of those implementations, the user interface inputis explicitly directly to automated assistant 112. For example, a usermay speak a predetermined invocation phrase, such as “OK, Assistant,” or“Hey, Assistant,” to cause automated assistant 112 to enter a statewhere the automated assistant 112 can receive inputs, such as inputaudio signals, text-based inputs, or touch-based inputs. The inputs caninclude content requests.

The automated assistant 112 may engage in a dialog session in responseto user interface input, even when that user interface input is notexplicitly directly to automated assistant 112. For example, automatedassistant 112 may examine the contents of user interface input andengage in a dialog session in in response to certain terms being presentin the user interface input and/or based on other cues. In manyimplementations, automated assistant 112 may utilize speech recognitionto convert utterances from users into text, and respond to the textaccordingly, e.g., by providing visual information in the form of a baseimage and/or a transformed image, by providing search results, generalinformation, and/or taking one or more response actions (e.g., playingmedia, launching a game, ordering food, etc.). In some implementations,the automated assistant 112 can additionally or alternatively respond toutterances without converting the utterances into text. For example, theautomated assistant 112 can convert voice input into an embedding, intoentity representation(s) (that indicate entity/entities present in thevoice input), and/or other “non-textual” representations and operate onsuch non-textual representations. Accordingly, implementations describedherein as operating based on text converted from voice input myadditionally and/or alternatively operate on the voice input directlyand/or other non-textual representations of the voice input.

Each of the client computing devices 102 and computing device(s)operating cloud-based automated assistant components 116 may include oneor more memories for storage of data and software applications, one ormore processors for accessing data and executing applications, and othercomponents that facilitate communication over a network. The operationsperformed by one or more computing device 102 and/or automated assistant112 may be distributed across multiple computer systems. Automatedassistant 112 may be implemented as, for example, computer programsrunning on one or more computers running in one or more locations thatare coupled to each other through a network.

The client computing device 102 may operate an automated assistantclient 108. In various implementations, each automated assistant client108 may include a corresponding speech capture/text-to-speech(“TTS”)/speech-to-text (“STT”) module 110. In other implementations, oneor more aspects of speech capture/TTS/STT module 110 may be implementedseparately from the automated assistant client 108.

Each speech capture/TTS/STT module 110 may be configured to perform oneor more functions: capture a user's speech, e.g., via a microphone(which in some cases may include presence sensor 104); convert thatcaptured audio to text (and/or to other representations or embeddings);and/or convert text to speech. For example, in some implementations,because a client device 102 may be relatively constrained in terms ofcomputing resources (e.g., processor cycles, memory, battery, etc.), thespeech capture/TTS/STT module 110 that is local to each client device102 may be configured to convert a finite number of different spokenphrases—particularly phrases that invoke automated assistant 112—to text(or other forms, such as lower dimensionality embeddings). Other speechinput may be sent to cloud-based automated assistant components 116,which may include cloud-based TTS module 118 and/or cloud-based STTmodule 120.

Cloud-based STT module 120 may be configured to leverage the resourcesof the cloud to convert audio data captured by speech capture/TTS/STTmodule 110 into text (which may then be provided to natural languageprocessor 122). Cloud-based TTS module 118 may be configured to leveragethe virtually limitless resources of the cloud to convert textual data(e.g., natural language responses formulated by automated assistant 112)into computer-generated speech output. The TTS module 118 may providethe computer-generated speech output to client device 102 to be outputdirectly, e.g., using one or more speakers. In other implementations,textual data (e.g., natural language responses) generated by automatedassistant 112 may be provided to speech capture/TTS/STT module 110,which may then convert the textual data into computer-generated speechthat is output locally.

Automated assistant 112 (e.g., cloud-based assistant components 116) mayinclude a natural language processor 122, the TTS module 118, the STTmodule 120, transformation parameters 124, the digital componentselector 126, and other components. In some implementations, one or moreof the engines and/or modules of automated assistant 112 may be omitted,combined, and/or implemented in a component that is separate fromautomated assistant 112. In some implementations, to protect privacy,one or more of the components of automate assistant 112, such as naturallanguage processor 122, speech capture/TTS/STT module 110, etc., may beimplemented at least in part on client device 102 (e.g., to theexclusion of the cloud).

The automated assistant 112 can generate or select responsive content(e.g., digital components) in response to various inputs generated by auser of client device 102 during a human-to-computer dialog session withautomated assistant 112. Automated assistant 112 may provide theresponsive content (e.g., over one or more networks 114 when separatefrom a client device of a user) for presentation to the user as part ofthe dialog session. For example, automated assistant 112 may generateresponsive content in response to free-form natural language inputprovided via client device 102. As used herein, free-form input is inputthat is formulated by the user that is not constrained to a group ofoptions presented for selection by the user.

Natural language processor 122 of automated assistant 112 processesnatural language input generate by users via client device 102 and maygenerate annotated output for use by one or more components of automatedassistant 112. For example, the natural language processor 122 mayprocess natural language free-form input that is generated by a user viaone or more user interface input devices of client device 102. Thegenerated annotated output includes one or more annotations of thenatural language input and optionally one or more (e.g., all) of theterms of the natural language input. Natural language processor 122 canparse the input to identify the content request and one or more keywordsin the input.

The natural language processor 122 can identify and annotate varioustypes of grammatical information in natural language input. For example,the natural language processor 122 may include a part of speech taggerconfigured to annotate terms with their grammatical roles. Also, forexample, in some implementations the natural language processor 122 mayadditionally and/or alternatively include a dependency parser configuredto determine syntactic relationships between terms in natural languageinput.

The natural language processor 122 can include an entity taggerconfigured to annotate entity references in one or more segments such asreferences to people (including, for instances, literary characters,celebrities, public figures, etc.), organizations, locations (real andimaginary), and so forth. The entity tagger of the natural languageprocessor 122 may annotate references to an entity at a high level ofgranularity (e.g., to enable identification of all references to anentity class such as people) and/or a lower level of granularity (e.g.,to enable identification of all references to a particular entity suchas a particular person). The entity tagger may rely on content of thenatural language input to resolve a particular entity and/or mayoptionally communicate with a knowledge graph or other entity databaseto resolve a particular entity.

In some implementations, the natural language processor 122 mayadditionally and/or alternatively include a coreference resolverconfigured to group, or “cluster”, references to the same entity basedon one or more contextual cues. For example, the coreference resolvermay be utilized to resolve the term “there” to “Hypothetical Café” inthe natural language input “I liked Hypothetical Café last time we atethere.”

One or more components of the natural language processor 122 can useannotations from one or more other components of the natural languageprocessor 122. For example, in some implementations, the named entitytagger may rely on annotations from the coreference resolver and/ordependency parsers in annotating all mentions to a particular entity.Also, for example, in some implementations the coreference resolver mayrely on annotations from the dependency parser in clustering referencesto the same entity. In many implementations, in processing a particularnatural language input, one or more components of the natural languageprocessor 122 may use related prior input and/or other related dataoutside of the particular natural language input to determine one ormore annotations.

The natural language processor 122 can determine a request, such as acontent request, within an audio input request received from the clientdevice 102. The digital component selector 126 can be a part of thecloud-based automated assistant component 116 or separate from thecloud-based automated assistant component 116. The digital componentselector 126 can receive the content request or an indication thereof.The content selector component 126 can receive prior audio inputs (orpackaged data object) for the selection of a digital component based onthe content request. The content selector component 126 execute areal-time digital component selection process to select the digitalcomponent. The content selector component 126 can select addition orsupplemental digital components based on the input request.

The real-time digital component selection process can refer to, orinclude, selecting digital component objects (which may includesponsored digital component objects) provided by third party contentproviders 128. The real-time content selection process can include aservice in which digital components provided by multiple contentproviders are parsed, processed, weighted, or matched based on thepackaged data object in order to select one or more digital componentsto provide to the client device 102. For example, a plurality of contentprovider devices can provide a digital component with associated bid tothe digital component selector 126. Based on a ranking of the bids fromeach of the content provider devices, the digital component selector 126can select one of the provided digital components. The digital componentselector 126 can perform the content selection process in real-time.Performing the content selection process in real-time can refer toperforming the content selection process responsive to the request forcontent received via the client device 102. The real-time contentselection process can be performed (e.g., initiated or completed) withina time interval of receiving the request (e.g., 5 seconds, 10 seconds,20 seconds, 30 seconds, 1 minute, 2 minutes, 3 minutes, 5 minutes, 10minutes, or 20 minutes). The real-time content selection process can beperformed during a communication session with the client device 102, orwithin a time interval after the communication session is terminated.

For example, the digital component selector 126 can be designed,constructed, configured or operational to select digital componentobjects based on the content request in the input audio signal. Thedigital component selector 126 can identify, analyze, or recognizevoice, audio, terms, characters, text, symbols, or images of thecandidate digital components using an image processing technique,character recognition technique, natural language processing technique,or database lookup. The candidate digital components can includemetadata indicative of the subject matter of the candidate digitalcomponents, in which case digital component selector 126 can process themetadata to determine whether the subject matter of the candidatedigital component corresponds to the content request.

Responsive to the request identified in the input audio (or other)signal, the content selector component 126 can select a digitalcomponent object from a database associated with the digital componentprovider 128 and provide the digital component for presentation via theclient device 102. The digital component object can be provided by adigital component provider 128. The content selector component 126 canselect multiple digital components. The multiple digital components canbe provided by different digital component providers 128. For example, afirst digital component provider 128 can provide a primary digitalcomponent responsive to the request and a second digital componentprovider 128 can provide a supplemental digital component that isassociated with or relates to the primary digital component. The clientdevice 102 or a user thereof can interact with the digital componentobject. The client device 102 can receive an audio, touch, or otherinput response to the digital component. The client device 102 canreceive an indication to select a hyperlink or other button associatedwith the digital component object that causes or allows the clientdevice 102 to identify digital component provider 128, request a servicefrom the digital component provider 128, instruct the digital componentprovider 128 to perform a service, transmit information to the digitalcomponent provider 128, or otherwise identify a good or serviceassociated with digital component provider 128.

The digital component selector 126 can select a digital component thatincludes text, strings, or characters that can be processed by a text tospeech system or presentable via a display. The digital componentselector 126 can select a digital component that is in a parameterizedformat configured for a parametrically driven text to speech technique.The digital component selector 126 can select a digital component thatis in a format configured for display via client device 102 or theprojector 106. The digital component selector 126 can select a digitalcomponent that can be re-formatted to match a native output format ofthe client device 102, application, or projector 106 to which thedigital component is transmitted. The digital component selector 126 canprovide the selected digital component to the client device 102 orautomated assistant client 108 or application executing on the clientdevice 102 for presentation by the client device 102 or the projector106.

The automated assistant 112 can generate dynamic image transformationsto display a base image and/or transformed version of the base image,such as the digital components selected by the content selectorcomponent 126. The automated assistant 112 can identify one or moreactive users, generate image transformations to display for activeuser(s), and/or generate a base image to project based on the distanceof a user from the projected image.

The cloud-based annotated assistant components 116 may generatetransformation parameters 124. In other implementations, transformationparameters 124 may be generated separately from cloud-based automatedassistant components 116, e.g., on client device 102, by automatedassistant client 108 and/or on another computer system (e.g., in theso-called “cloud”).

The transformation parameters 124 can be used by automated assistantclient 108 or cloud-based automated assistant components 116 to generatea transformed digital component from a base digital component. Forexample, the transformation parameters can be used to generate atransformed image from base image. For example, transformationparameters 124 can include identification information for a user'sposition within an environment. Warping parameters, can be a specifictype of transformation parameters which can be used by automatedassistant client 108 and/or cloud-based automated assistant components116 to warp a base image into a transformed image. Warping parametersmay include, for example, one or more of: the pose of a user, the gazeof a user, the facial identification of a user (with approval of theuser), the voice identification of a user (with approval of the user),the distance from a projector to the surface an image is projected onto,the shape of the surface the image is projected onto, or any combinationthereof. In some implementations, automated assistant 112 can performimage warping, a linear transformation similar to image rectification(described in more detail below) on transformation parameters 124 togenerate a transformed image from a base image and/or an additional baseimage.

A user's pose can be determined via presence sensors 104, and theautomated assistant 112 can use the distance from the user (which can begenerated from the user's pose) to the projection surface (e.g., thelocation where the image is projected) to identify a base image to usefor that particular user. The information contained in the base imagecan be dependent on the distance from the user to the projectionsurface. For example, a base image identified for a user located farfrom a surface can contain limited information from the user's calendarsuch as only the next item on the user's calendar. In contrast, a baseimage identified for a user located near the projection surface cancontain more detailed information from the user's calendar such as theuser's schedule for the entire day. If the user is close enough to theprojection surface to touch the projection, in many implementations, thebase image can also contain touch sensitive elements, for example, theability for the user to scroll through calendar events for the entireweek.

Client device 102 and/or cloud-based automated assistant components 116can be in communication with one or more third party agents hosted byremote device(s) (e.g., another cloud-based component). For instance, auser voice command “order a large pepperoni pizza from Agent X” cancause the automated assistant client 108 (or cloud-based automatedassistant component(s) 116) to send an agent command to a third partyagent “Agent X”. The third party agent can be a digital componentprovider 128. The agent command can include, for example, a request thatcan include an intent value that indicates an “ordering” intentdetermined from the voice command, as well as optional slot values suchas “type=pizza”, “toppings=pepperoni”, and “size=large.” In response,the third party agent can provide, to the automated assistant 112,responsive content that includes (or enables generation of) base digitalcomponents relevant to the pizza order. For example, the base digitalcomponent can be a base image that can include graphical representationsof the order being confirmed, as well as of a status of the pizza order.The content selector component 126 can also select supplemental oradditional digital components to display in association with the basedigital component. For example, the content selector component 126 canselect an additional digital component that can include a video sequenceof real time tracking of the pizza delivery driver on a map as the pizzais being delivered. Once these base image(s) are received from the thirdparty agent, the automated assistant client 108 and/or the cloud-basedautomated assistant components 116 can generate a transformation of thebase image(s) and a transformed image can be projected onto the wall forthe user.

FIGS. 2A and 2B illustrate an example of a user viewing an imageprojected onto a wall with respect to different user locations. Image200 contains a scene of a room at a first time and is illustrated inFIG. 2A. Image 200 contains user 202, projected image 204, client device206 (that includes an integrated projector and/or is in communicationwith a locally accessible separate projector), and table 208. It will beunderstood that that image 200 is merely illustrative and for examplethe client device and/or projector can be separate devices, the clientdevice and/or projector can be on a surface other than a table such as adesk, a dresser, and/or mounted onto surfaces such as a wall and/orcelling, more than one client device can be present within the room,more than one projector can be within the room, and/or more than oneuser can be in the room. Client device 206 can determine the pose ofuser 202. If client device 206 determines the user 202 is an activeuser, the client device can project a variety of images 204 for the userincluding a base image and/or a transformed image that is atransformation of the base image.

The same room captured at a second time is illustrated in FIG. 2B. Image225 is a scene of the room captured at a second time. Image 225 containsthe same client device 206 (that includes an integrated projector and/oris in communication with a locally accessible separate projector) andtable 208, but the pose of the user has changed to a second pose 226. Asan illustrative example, the user has moved to the right, from previouspose 228, to the second pose 226. Client device 206 can detect the userin the second pose 226 and make a determination if the user is an activeuser in the second pose. If the user is an active user in the secondpose, projected image 230 can be projected onto the wall for the user inthe second pose 226. Projected image 230 can be projected onto the sameposition of the wall as projected image 204 despite any change ofcontent within the images themselves through image transformationprocesses. In a variety of implementations, contents of projected image230 can change as user moves to the second pose 226. For example,projected image 204 can be a first transformed image and projected image230 can be a second transformed image both of the same base image. Inother words, client device 206 can generate image transformations suchthat the user in the first pose 202 and the user in the second pose 226will perceive the same projected image even though the perspective ofthe user has changed relative to the position to the image on the wallhas changed. In many implementations, client device 206 can use a baseimage and can perform image transformations to project transformedimages for both user poses such that projected image 204 for user pose202 is a first transformed image and projected image 230 for the user inthe second pose 226 is a second transformed image. In contrast, if user202 has a viewpoint perpendicular to (e.g., directly in front of) image204, client device 206 can project the untransformed base image asprojected image 204 (and similarly computing device 206 can project atransformed image as projected image 230 for the user in the second pose226).

The client device 206 can project additional digital components forviewing by the users near the client device 206. For example, the clientdevice 206 can project an additional image 232 onto the wall. Thesubject matter of the additional image 232 can be related to subjectmatter of the projected image 230. For example, the additional image 232can provide additional information regarding the content of theprojected image 230. The additional image 232 can be a duplicate of theprojected image 230 that is projected onto a different location of thewall. For example, the client device 206 can project the additionalimage 232 as a duplicate of the projected image 230 to the differentlocation to improve the ability of a second user to view the projectedcontent.

The client device 206 can generate the additional image 232 and projectthe additional image 232 to a location selected to be viewed by a userpose other than the user pose 226 for which the location of theprojected image 230 is selected. The client device 206 can select thelocation of the additional image 232 such that the additional image isdisplayed in a non-prominent location. For example, and continuing theabove pizza ordering example, the pizza tracker information can beincluded in the additional image 232. In this example, the pizza trackerinformation can be displayed on a non-prominent location of the wall,such as, near a corner, intersection with the floor, intersection withthe wall, or near an object positioned near the wall (such as near alamp positioned in front of the wall).

The content selector component 126 can select digital components for theclient device 206 to project onto the wall that are not related to arequest parsed from an input signal. The content selector component 126can select digital component to be projected on the wall between timesthat the client device 206 receives input signals. For example, contentselector component 126 can select supplemental digital components basedon the location, context, or action of the client device, or based onthe preferences a user associated with the client device 206. Forexample, the client device can project supplemental digital componentsonto the wall between the times that the client device 206 receivesinput signals, such as weather information, time information, or otherdigital components selected or configured by the user of the clientdevice 206.

Example images further illustrating a base image and the same base imagewithout dynamic image transformation viewed by a user from anon-perpendicular angle (e.g., from the side) are illustrated in FIG. 2Cand 2D. A base image can be directly projected onto a wall for a userwho has a perspective of the image directly perpendicular to the wall.Additionally, a base image can be transformed using a variety of dynamicimage transformations such that a client device can project atransformed image that will appear to look as substantially similar (ifnot identical) to the base image as the user changes perspective fromlocation of the projected image within the room.

FIG. 2C contains image 250 which contains dashed line 252 and dashedline 254. Dashed line 252 and dashed line 254 are parallel lines (i.e.,likes that are equidistant and will never meet). In manyimplementations, image 250 can be an example of a base image used by aclient device which can be processed using image transformations suchthat dashed line 252 and dashed line 254 can appear parallel to a userwhen projected onto a wall regardless of the pose of a user in the room.For example, when a user is viewing the projected image from anon-perpendicular angle (e.g., the side) image transformations can makethe lines in the transformed base image still appear parallel.Additionally, when image 250 is projected onto a surface by a clientdevice and viewed by a user from a direction perpendicular to (e.g.,directly in front of) a surface as a base image, dashed line 252 anddashed line 254 appear parallel.

In contract, FIG. 2D contains image 275 made of dashed line 276 anddashed line 278. Dashed line 276 and dashed line 278 are non-paralleland slant towards each other on the left hand side of the image. In avariety of implementations, lines 276 and 278 are an example of how abase image of parallel lines (for example if the image 250 containingdashed parallel lines was utilized as a base image) could be viewed by auser from a perspective in the room other than perpendicular to theimage if no image transformation was used on the image projection. As anexample, a pair of lines closer on the right side and further apart onthe left side can indicate a user is standing to the right of a baseimage of parallel lines without image transformation. This user pose canbe similar to the user in the second pose 226 after the user has movedto the right side of the room in FIG. 2B.

Client device 206 can generate an image transformation on a base imagesimilar to the parallel lines of image 250 in FIG. 2C such that a userin second pose 226 will see a transformed image generated by clientdevice 206 with parallel which looks like image 250 instead of thenon-parallel lines in image 275 the user would normally see from anon-parallel viewing position as illustrated in FIG. 2D. Additionally,the user in the second pose 226 will see an image projected that is thesame (or substantially similar) size as the base image and that is inthe same (or substantially similar) location as the base image.

A process for dynamic image transformation using image warping inaccordance with various implementations is illustrated in FIG. 3. Theprocess 300 can be performed by one or more client devices, and or anyother apparatus capable of interacting with an automated assistant. Theprocess includes identifying (302) a base image. The base image can beused by an automated assistant (at the client device and/or at remotedevice(s)) to generate a transformed image in dependence on a pose of auser. For example, a base image can include any of a variety ofinformation that a client device can cause a projector to project forpresentation to user, such as weather information for a particular day.In many implementations, a user can request information that can beincluded in a base image through interaction with the client devicethrough verbal, textual, graphical, and/or visual input to the clientdevice. For example, a user can ask the client device “OK Assistant—whatis the weather tomorrow” and a base image can be identified containingweather information for the following day. Moreover, base images can beprovided to a client device by a third party agent (optionally providedvia the cloud-based automated assistant component(s) 116). For example,if a user orders a delivery from a restaurant, the third party agentassociated with the restaurant can send the automated assistant a baseimage which can include status updates relating to the delivery. Yetfurther, in various implementations the automated assistant can generatebase images and provide them for projection independent of explicit userinput. For example, an image that contains a weather forecast can beprojected in response to detecting presence of a user, but withoutexplicit input from the user.

A first pose of a user can be determined (304) within a room. A varietyof sensors in the client device, such as presence sensors 104 in FIG. 1,can be used to determine the first pose of the user such as amicrophone, a camera, an infrared camera, a time of flight camera,and/or a touch screen display on the client device. For example, aclient device can use a camera to detect a user and generate the pose ofthe user within the room. In several implementations, a client devicewith many sensors can determine which sensors to use individually and/orin combination based on previously known information regarding the poseof a user. For example, when little information is known about a user'spose location, a sensor which gathers information about the entire roomcan be used such as a time of flight camera. Alternatively, if a user isinteracting the client device at the touch screen, it can safely beassumed the user is close to the client device, and a shorter rangesensor could be used to determine the first pose of the user.Additionally or alternatively, sensors integrated into third partyagents can be used to determine a pose of a user. It will be understoodthat these examples are merely illustrative any of a variety of ways todetect user pose in a room can be utilized as appropriate in accordancewith various implementations.

First image transformation parameters can be generated (306) using thefirst pose of the user to warp the base image. Image warping can includea linear transformation process which can use image warping parameterssuch as a user's pose, position, gaze, facial identification (withapproval of the user), voice identification (with approval of the user)and/or distance from the projector to the surface the image is projectedonto to generate image transformation parameters. A process fordetermining a transformation to warp an image in accordance with manyimplementations will be discussed below in FIG. 4. In manyimplementations, first image transformation parameters can include forexample (but are not limited to) base image data, precalculated imagetransformation data for a base image, first user pose data, and/or anyother information relevant to generating a transformed image. In severalimplementations, first image transformation parameters can include imagewarping parameters.

A first transformed image can be generated (308) using the first imagetransformation parameters where the first transformed image is atransformation of the base image. In general, a transformed image candiffer from the base image in content, but is generated in such a waythat it appears to be substantially similar in size and location to thebase image when projected onto the surface. In many implementations, animage transformation is not necessary when the first pose of the user isviewing the image perpendicular to (e.g., directly in front of) thesurface the image is projected onto. In this specific example, firstimage transformation parameters can indicate to the client device thatthe generation of a first transformed image is unnecessary and the baseimage will take the place of the first transformed image. Furthermore,the base image, being identical to itself, will take up an identicalsize and position on the wall. In many implementations, while atransformed image will differ from its base image counterpart, it can beof the same size as the base image and/or can, when projected, beprojected in the same location and as the same size as would the baseimage. For example, when a base image and a transformed image areprojected they can be of an identical size on a projection surface andin identical locations on the projection surface. In someimplementations, the transformed image is projected in a “substantiallysimilar” position as the base image. For example, the first transformedimage and the base image can be identical in size, but two images arenot projected onto exactly the same position of the wall and thetransformed image can take up 95% of the same position on the wall asthe base image. Alternatively, the transformed image can be slightlysmaller than the base image, and while the transformed image is takingup the same position on the wall, because the transformed image isslightly smaller it is only taking up 90% of the wall space of the baseimage (but is not taking up any wall space outside of the area of theoriginal base image). Additionally, a transformed image can be slightlylarger than a base image and can take 105% of the wall space of the baseimage and still take up a substantially similar position on the wall.

The projector can be caused (310) by the client device to project thefirst transformed image onto a surface. The projector can be integratedwithin a client device similar to client device 102 and/or a separateprojector similar to projector 106. The surface can include variousareas with in a room including a wall, the ceiling, and/or the floor.For example, if the first pose of a user is determined to be recliningon a couch, projecting information on the ceiling (instead onto thewall) might be a more useful location for the user to view the projectedtransformed image.

A second pose of the user can be determined (312) after the user hasmoved. User movement can be determined using many of the same sensorsused to detect the first pose of the user in step (304) including amicrophone, a camera, an infrared camera, a time of flight camera,and/or a touch screen display on the client device. For example, amicrophone that detected a user speaking in one pose in the room candetect when the user's voice is coming from a different direction, andthus the user has moved within the room. Once a client device hasdetermined the user has moved, the second pose of the user can bedetermined in a manner similar to determining the first pose of the userin step (304) including using a sensor and/or a combination of sensors.

Second image transformation parameters can be generated (314) using thesecond pose of the user to warp the image. As previously described,image warping can be a linear transformation process which can utilizeimage warping parameters to generate image transformation parameters. Aprocess for determining a transformation to warp an image in accordancewith many implementations will be discussed below in FIG. 4. In someimplementations, second image transformation parameters can include forexample (but are not limited to) base image data, precalculated imagetransformation data for the base image, first user pose data, seconduser pose data, and/or any other information relevant about a user togenerate a transformed image. In many implementations, second imagetransformation parameters can include image warping parameters.

A second transformed image can be generated (316) using the second imagetransformation parameters where the second transformed image is atransformation of the base image or an additional base image. Generatinga second transformed image can be performed in a manner similar to step(308) described above. Additionally or alternatively, an additional baseimage can be used in place of the base image to generate the secondtransformed image when a base image changes over time. For example, abase image of a projected calendar event image can change once the eventhas ended and an additional base image for the next calendar event imagecan be projected. Additionally, individual frames of a video sequencecan make a corresponding sequence of base images. In this case the baseimage can change to an additional base image independent of usermovement and additional image transformations can be calculated for afirst pose of the user for additional base images.

The projector can be caused (318) by the client device to project thesecond transformed image onto the surface. The second image can beprojected in a manner similar to the projection of the first transformedimage in step (310) above. However, if the client device detects theviewpoint of the user has substantially changed and a different surfacewould be preferable, the client device can instruct the projector toproject the second transformed image onto a different surface. Forexample, if the first pose of a user is determined to be reclining on acouch and the first transformed image is projected onto the ceiling, andthe client device detects the user gets up from the couch and has movedwithin the room as the second pose, the second transformed image can beprojected onto a wall instead of the ceiling. Similarly, for example ifa user if facing north for the first pose, the first transformed imagecan be projected onto the North wall. In some implementations, if theuser moves to face south for the second pose, the second transformedimage can be projected onto the south wall.

A process for determining a transformation to warp an image inaccordance with various implementations is illustrated in FIG. 4. Theprocess 400 can be performed by one or more client devices, and/or anyother apparatus capable of interacting with an automated assistant forgenerating image transformation parameters by image warping. The process400 can include identifying (402) image warping parameters. Imagewarping parameters can include (but are not limited to) the pose of auser, the gaze of a user, the facial identification of a user (withapproval of the user), the voice identification of a user (with approvalof the user), the distance from a projector to the surface an image isprojected onto, and/or any other of a variety of user and/or hardwarerelated parameters which can be used as image warping parameter. In manyimplementations, a client device can use a single image warpingparameter such as the gaze of the user, to determine a transformation towarp an image. Alternatively, in several implementations, a clientdevice can use a combination of image warping parameters.

Individual sensors available to a client device can impact which warpingparameters are available to the client device. For example, if a clientdevice has sensors which can determine the gaze of a user, the clientdevice can use gaze as an image warping parameter. Additionally, in someimplementations, a client device can receive data to use as warpingparameters from sensors in third party agents such as a “smart”thermostat and/or other “smart” devices located within the room.

Image warping parameters can be identified by a client deviceindividually and/or in combination. In various implementations acombination of warping parameters can be identified by particularsensors available to the client device, the client device itself candetermine the combination, and/or the user can identify image warpingparameters by predetermining which combination of warping parameters theclient device should use. For example, a client device can use the gazeof a user as a warping parameter. However, the client device can makethe choice to not the gaze of the user as a warping parameter at nightwhen the lights are off in the room and thus the client device can havea harder time determining the gaze of a user, and instead the clientdevice can identify other warping parameters such as determining aposition of a user using voice identification (which is often lesssensitive to lighting conditions in the room).

Determining transformation parameters to warp a base image can depend on(but are not limited to) which warping parameters and the values of datacontained in the warping parameters which are used by the client device.In many implementations, different adjustments and/or combinations ofadjustments can be made using image warping parameters to generate imagetransformation parameters which can warp a base image as an imagetransformation. For example, a base image with a particular set of imagewarping parameters can require only one adjustment to determinetransformation parameters to such as only requiring a vertical rotationof the base image. In contrast, in several implementations, when a useris in a different position, a base image can have a different set ofimage warping parameters and can require a combination of adjustmentssuch as a horizontal rotation of the base image and a scaling of thebase image. It will be understood that these examples are merelyillustrative any number of combinations of image adjustment combinationscan be made by a client device as appropriate to determinetransformation parameters in accordance with various implementations.

A rotation of the base image can optionally be determined (404) by theclient device. Image rotations can spin an image on the same plane asthe wall the image is projected onto (i.e., on a plane parallel with theprojection surface). For example, a base image can be a rectangle thatwhen viewed by the user perpendicular to (e.g. directly in front of) asquare wall, the rectangular base image has lines parallel with theceiling, floor, and walls. Image warping parameters can determine anappropriate rotation of the base image to determine transformationparameters to generate a transformed image that is also a rectangle withlines parallel to the ceiling, floor, and walls when viewed from thepose of the user from a non-perpendicular pose (e.g., to the side of) ofthe square wall.

A scaling of the base image can optionally be determined (406) by theclient device. Image scaling adjusts the size of an image. Additionallyor alternatively, image warping parameters can determine an appropriatescaling of a base image to generate image warping parameters which canincrease and/or increase the size of the base image when imagetransformation parameters are utilized to generate a transformed imagesuch that the transformed image takes up the same and/or substantiallysimilar amount of space of the wall when projected as the base imagewhen viewed from the pose of the user from a non-perpendicular pose.

A skew adjustment of the base image can optionally be determined (408)by the client device. In general, a skew image is an image at an obliqueangle (i.e., at a slant). Image warping parameters can determine how toskew an image to generate image warping parameters to use in generatinga transformed image which can change the angles within the base image(i.e., slant portions of the image) such the determined transformationparameters can generate a transformed image that appears non-skewed whenviewed from the pose of the user from a non-perpendicular pose.

Transformation parameters to warp the base image as an imagetransformation can be determined (410) by the client device. Aspreviously described, the number of image adjustments necessary todetermine transformation parameters to warp a particular base image asan image transformation with a particular set of image warpingparameters can vary based on a number of factors including the baseimage and/or one or more of the warping parameters.

In many implementations, image warping can be viewed in some ways assimilar to image rectification with various differences. Imagerectification generally projects two images, each image having adifferent optical viewpoint, onto a common image plane. A matching pairof transformations, H and H′ can be used to rectify the pair of images.In contrast, while image warping shares two images (a base image and atransformed image), only one image in the pair (the transformed image)is being transformed. The base image never changes and a single baseimage can be associated with many image transformations (and thus manytransformed images) as the “optical viewpoint” of the user such as whenthe pose of the user within the room changes. Image warping generates asingle image transformation using a known “optical viewpoint” (which inthis context can be viewed as determined image warping parameters) of auser to match a transformed image to a known base image. In manyimplementations, similar mathematical techniques used in imagerectification (sometimes with slight modifications) can be utilized inimage warping including planar rectification, cylindrical rectification,and/or polar rectification. For example, a base image can contain acalendar event for a user. Image warping can generate a single imagetransformation (in contrast to the matching pair of imagetransformations generated with image rectification) using atransformation parameter such as the pose of the user within a room inplace of the “optical viewpoint” to generate a transformed image tocorrespond with the known base image for the user. Additionally oralternatively, other transformation parameters can be utilizedindividually and/or in combination for example, the pose and the gaze ofa user can be utilized as the “optical viewpoint” of the user whengenerating the single image transformation to generate a transformedimage to correspond with the known base image.

To further illustrate image adjustments that can be utilized indetermining transformation parameters to warp a base image, examples ofa variety of image adjustments are illustrated in FIGS. 5A-5D. Image 500contains an example of image rotation as illustrated in FIG. 5A. Image500 contains a square 502. A rotation to the left of square 502 isillustrated as rotated square 504. It will be understood that this ismerely an illustrative example, and rotated square 504 could be locatedin any of a variety of positions where rotated square 504 turns aroundan axis within image 500.

Image 525 contains an example of image scaling as illustrated in FIG.5B. Image 525 similarly contains square 502. However, square 502 isscaled to be larger and is illustrated as scaled square 526. Scaledsquare 526 is merely an illustrative example, as scaling can increase ordecrease the size of an object.

Image 550 contains an example of image skewing as illustrated in FIG.5C. Square 502 is similarly contained in image 550. A skewtransformation is performed on square 502 where square 502 is skewed tothe right as skewed square 552. This example of image skewing is merelyillustrative and skewing can occur in any direction including to theright, to the left, up, and/or down.

Image 575 contains an example of image translation as illustrated inFIG. 5D. Square 502 as illustrated in image 575 is translated up and tothe right to generate translated square 576. This example is merelyillustrative and image translations can occur in any direction.Additionally, image transformations discussed in FIGS. 5A-5D can beperformed on an image individually and/or in any of a variety ofcombinations.

A process for dynamic image transformation including detecting an activeuser in accordance with various implementations is illustrated in FIG.6. The process 600 can be performed by one or more client devices,and/or any other apparatus capable of interacting with an automatedassistant. The process includes identifying (602) a base image. Asdescribed above, a base image can be transformed in dependence on a poseof a user to generate a transformed image. In several implementations,identification of a base image can be performed in a manner similar tostep (302) in FIG. 3.

An active user can be detected (604) by the client device. In someimplementations, an active user is a user who is actively engaged withthe automated assistant, and can be detected in a number of waysincluding by movement, location, pose, facial identification (withapproval of the user), voice identification (with approval of the user),and/or gaze. Active engagement can include viewing a projecting image,listening to rendered audible content provided by the automatedassistant, and/or providing input to the automated assistant (e.g.,voice input, touch input, gestures, etc.). Sensors such as any of avariety of sensors included in presence sensors 104 and/or sensorsincluded in third party agents such as “smart” devices can collectsensor data to detect an active user. For example, a microphonedetecting a user giving the client device a command “OK Assistant—showme my calendar for tomorrow” could be identified as an active user.Additionally, a camera detecting a user looking at an image projectedonto a surface by the client device based on the user's gaze can beidentified as an active user. In many implementations, a combination oftechniques can be combined to detect an active user such as identifyingpose and facial identification (with approval of the user) of someone ina room to detect the person is an active user. Additionally, multipleactive users can be detected by a client device in the same room.Detecting an active user can map the detected active user to a userprofile or some other sort of identification of the user. Additionallyor alternatively, detecting an active user can indicate only that anactive user is engaged with the automated assistant.

A first transformed image can be generated (606) for the active user bythe client device. Any of a variety of dynamic image transformations canbe used to generate the first transformed image including image warping.In many implementations, generating a first transformed image canperformed in a manner similar to steps (304)-(308) of FIG. 3.

The client device can cause (608) the projector to project the firsttransformed image onto a surface. In a variety of implementations, thefirst transformed image is a transformation of the base image.Projecting the first transformed image can be performed in a mannersimilar to step (310) of FIG. 3 including using a projector integratedwith the client device and/or using a separate projector.

User movement can be detected (610) by the client device. In someimplementations, the detection of movement can be performed in a mannersimilar to determining user movement in step (312) of FIG. 3.Additionally, in a variety of implementations, threshold of movement canbe met before a client device determines the second pose of the user.For example, if a user moves less than a millimeter, it is unlikely thissmall amount of movement will generate a new transformed image that isdifferent enough than the first transformed image to be worthgenerating, and it can save system resources to wait until a user movesa greater distance before making a determination of the second pose ofthe user. On the other hand, if a user moves 3 meters to the rightsimilar to how the user moved to second pose 226 to the left in FIG. 2B,a second transformed image can be substantially different than the firsttransformed image (depending on the content of the image). This can beespecially true in implementations where gaze is being utilized todetermine and/or as part of determining user pose, as user's eyesfrequently make very small shifts. Computational resources could besaved if a threshold requiring a larger shift in gaze is required beforea second transformed image for the active user is generated.Alternatively, in many implementations when user gaze is being utilized,any projected images can be stabilized using a variety of imageprocessing techniques as a way to compensate for these small shifts auser's eyes are making.

A second transformed image can be generated (612) for the moved activeuser by the client device. Any of a variety of dynamic imagetransformations can be used to generate the second transformed imageincluding image warping. In some implementations, generating a secondtransformed image can be performed in a manner similar to steps(312)-(316) of FIG. 3.

The client device can cause (614) the projector to project the secondtransformed image onto the surface. In many implementations, the secondtransformed image is a transformation of the base image or is atransformation of an additional image. Projecting the second transformedimage can be performed in a manner similar to step (608) and/or step(318) of FIG. 3.

A process for dynamic image transformation for multiple active users inaccordance with various implementations is illustrated in FIG. 7. Theprocess 700 can be performed by one or more client devices, and/or anyother apparatus capable of interacting with an automated assistant. Theprocess includes identifying (702) a base image. As described above, abase image can be transformed in dependence on a pose of a user togenerate a transformed image. In several implementations, identificationof a base image can be performed in a manner similar to step (302) inFIG. 3.

Multiple active users in a group of users can be detected (704) by aclient device. In various implementations, many people can be in a room,but not all of them can be actively engaged with the client device. Forexample, a room can have two users and neither user is engaged with theclient device (and thus the room has no active users), one of the twousers can interacting with the client device and be an active user,and/or both of the two users can be engaged with the client deviceactive users. In several implementations, multiple active users can bedetected in a manner similar to how individual active users can bedetected in step (604) of FIG. 6. Alternatively, multiple active userscan be detected by sensors in groups, and or all active users can bedetected in a room simultaneously. For example, sensors integrated intoa client device could detect a cluster of active users in the sameportion of the room at the same time. Moreover, some types of sensorscould process an entire room at the same time and could detect allactive users in a room simultaneously. For example, a camera with a 360degree view can detect which users are looking at a projected image inan entire room and thus detect active users simultaneously.Additionally, a combination of sensors can be used to detect a group ofactive users. For example, furniture in a room can block a camera fromdetecting one particular active user, but a microphone could be used todetect voice commands to the automated assistant from the user behindthe furniture to determine that user is in the group of active users. Itshould be readily appreciated that these examples are merelyillustrative and any of a number of ways to detect active users can beutilized in accordance with various implementations.

A first transformed image can be generated (706) for the multiple activeusers by the client device. Any of a variety of dynamic imagetransformations can be used to generate the first transformed imageincluding image warping. In some implementations, generating a firsttransformed image can be performed in a manner similar to generating animage for a single active user such as steps (304)-(308) of FIG. 3.Additionally, image warping parameters can take multiple user poses intoaccount when generating image warping to generate a first transformedimage. For example, if all the active users are clustered in a singlearea of a room, a client device can decide to treat the multiple activeusers in a way similar to a single active user when generating atransformed image. Alternatively, if most active users are clustered ina single area of a room, and a single active user is in a second area ofthe room, the client device might largely ignore the active users in thesecond area of the room and still generate a transformed image in a waysimilar to a single active users. In some implementations, if activeusers are more equally spread out throughout the room, the client devicecan make decisions during the image warping process to generate thefirst transformed image. For example, the client device could combinethe poses of multiple active users in a meaningful way to generate apose which can take the place of a single active user pose for use ingenerating a first transformed image. For example the poses of multipleactive users in a room can be averaged into a single pose. While thismight not generate the best first image transformation for anyindividual active user, it can generate the best first imagetransformation for the group of users as a whole. Additionally, in someimplementations, a client device can perform a weighted averaging ofmultiple active users, giving more weight to the pose of predeterminedusers who can be identified using facial identification (with approvalof the user) and/or voice identification (with approval of the user).For example, if the first transformed image is an event from a groupcalendar, the client device can determine which active users have accessto that group calendar, and give greater weight to their pose whengenerating a weighted averaging of multiple active users. In manyimplementations, there can be no “best” first transformed image todisplay for the entire group of active users and a client device cansimply display the base image in place of a transformed image.

The client device can cause (708) the projector to project the firsttransformed image onto a surface. In some implementations, the firsttransformed image is a transformation of the base image. Projecting thefirst transformed image can be performed in a manner similar to step(310) of FIG. 3 including using a projector integrated with the clientdevice and/or using a separate projector.

Movement can be detected (710) in the group of users by the clientdevice. In some implementations, the detection of movement in a group ofusers can be performed in a manner similar to determining user movementin step (312) of FIG. 3. Additionally, in many implementations, athreshold of movement can be met movement is detected in the group ofusers in a manner similar determining a threshold of movement in a userin step (610) of FIG. 6.

A second group of multiple active users can be detected (712) by theclient device. In many implementations, detecting a second group ofmultiple active users can be performed in a manner similar to detectingmultiple active users in step (704). Active users in the second group ofactive users can be different than the active users in the first group.However, there can be overlap between active users in the two groups ofactive users (but it should be noted it is not a requirement to have anyoverlap between active users in the first group and active users in thesecond group).

A second transformed image for the second group of multiple active userscan be generated (714) using the client device. In variousimplementations, generating the second transformed image can beperformed in a manner similar to generating the first transformed imagefor the multiple active users described in step (706).

The client device can cause (716) the projector to project the secondtransformed image onto the surface. In many implementations, the secondtransformed image is a transformation of the base image or is atransformation of an additional image. Projecting the second transformedimage can be performed in a manner similar to step (614) and/or step(318) of FIG. 3.

Generating the base image can be based on the distance of the user fromthe projected image (i.e., the distance from the user to the surface theimage is projected on). An example of a user viewing different baseimages projected onto a wall, where the projected base image isdetermined based on the user's distance from the projected image isillustrated in FIGS. 8A and 8B. Image 800 contains a first scene of aroom at a first time and is illustrated in FIG. 8A. Image 800 containsuser 802, projected image 804, client device 806 (that includes andintegrated projector and/or is in communication with a locallyaccessible separate projector), and table 808. The contents of the roomin image 800 are merely illustrative and, for example, the client deviceand/or projector can be separate devices, the client device and/orprojector can be on a surface other than a table such as a desk, adresser, and/or mounted onto surfaces such as a wall and/or celling,more than one client device can be present within the room, more thanone projector can be within the room, and/or more than one user can bein the room. Client device 806 can determine the pose of user 802, andcan determine a distance from user 802 to projected image 804. Theclient device identifies a base image depending on the distance fromuser 802 to projected image 804. Projected image 804 can be projected asthe identified base image and/or as a transformed image.

The same room is illustrated in FIG. 8B. Image 850 is a second of thescene of the room captured at a second time which contains the sameclient device 806 (that includes and integrated projector and/or is incommunication with a locally accessible separate projector), and table808. As an illustrative example, the user has moved to a second pose 852much closer to projected image 854. Similarly to FIG. 8A, the clientdevice can determine the second pose of the user 852 before determiningthe distance from the second pose user 852 to projected image 854. Theclient device can then identify a second base image corresponding to thedistance from second pose of the user 852 to projected image 854.Projected image 854 can be projected as the second identified base imageand/or as a second transformed image. Generally, the closer a user is toa projected image, the more detailed UI elements the identified baseimage will contain. For example, the first pose of user 802 is furtheraway from the projected image compared to the second pose user 852. Assuch, projected image 804 contains less detailed weather informationwhich includes weather information for today (for example, a graphicalimage of the sun and 72 degrees). In contrast, the second pose of user852 is much closer to the projected image so projected image 854contains more detailed information which includes weather informationfor both today and tomorrow (for example a graphical image of the sun atemperature of 72 degrees indicated as today's weather and a graphicalimage of a cloud with rain and a temperature of 54 degrees indicated astomorrow's weather).

A process for generating a base image based on the distance of a userfrom a projected image in accordance with various implementations isillustrated in FIG. 9. The process 900 can be performed by one or moreclient devices, and or/any other apparatus capable of interacting withan automated assistant. The process includes determining (902) the poseof a user by the client device. Determining the poser of a user(including user location) can be determined in a manner similar todetermining a pose of a first user as described in step (304) of FIG. 3.

A distance from the user to a projection surface can be determined (904)by the client device. The projection surface is the location where aprojected image is displayed. In several implementations, a clientdevice can determine the distance from the user to the projectionsurface only using user pose. In many implementations, additionalinformation such as the distance from the client device to theprojection surface may be necessary to determine the distance from theuser to the projection surface.

A base image can be identified (906) using the distance from the user tothe projection surface. Base images with more detailed UI elements aregenerally selected for users closer to the projection surface while baseimages with less detailed UI elements are generally selected for usersfurther away from the projection surface. In several implementations, abase image with touch sensitive UI elements can be identified for a userclose enough to make physical contact with the projection surface. Forexample, a client device can select a base image with a full day ofcalendar information for a user who can touch the projection surface,and additionally in many implementations the user can touch theprojected image to scroll through the projected calendar image.

A transformed image can be generated (908) from the base image using thepose of the user by the client device. In many implementations,generating a transformed image can be performed in a manner similar tosteps (304)-(308) of FIG. 3. In some implementations, process 900 canomit generation of a transformed image and the generated base image caninstead be projected.

The client device can cause (910) the projector to project thetransformed image onto the surface. Projecting the transformed image canbe performed in a manner similar to step (310) of FIG. 3.

FIG. 10 illustrates a block diagram of an example method to generateinterfaces in an audio-based, networked system. The method 1100 caninclude receiving an input audio signal (1102). The method 1100 caninclude parsing the input audio signal (1104). The method 1100 caninclude selecting a first digital component and a second digitalcomponent (1106). The method 1100 can include determining a distance(1108). The method 1100 can determine transformation parameters (1110).The method 1100 can include generating a first transformed digitalcomponent and a second transformed digital component (1112). The method1100 can include transmitting the first transformed image and the secondtransformed image (1114).

The method 1100 can include receiving an input audio signal (1102). Themethod 1100 can include receiving, by the natural language processor,the input audio signal. The input audio signal can be an input audiosignal that is detected by a microphone or other sensor located at aclient device. The data processing system can receive the audio input inone or more portions or as a bulk or batch upload (e.g., multipleportions of the conversations uploaded in a single transmission toreduce the number of transmissions).

The method 1100 can include parsing the input signal (204). The naturallanguage processor can parse the input signal to identify a request andone or more keywords in the input audio signal. The request can be arequest for a digital component. For example, the request can be for adigital component that includes images, video, text, audio files, or anycombination thereof. The keywords can include terms that are relevant,identified by, or associated with the requested digital component. Akeyword can include one or more terms or phrases. For example, for arequest that a digital component including the current weather in SanFrancisco be displayed on a wall or other projection surface by theclient device, the keyword can be “weather” or “San Francisco.”

The method 1100 can include selecting a first digital component and asecond digital component (1106). The first and second digital componentcan be base digital components, such as base images. The base digitalcomponents can include one or more image or video files. The contentselector component can select the first base digital component based onthe request parsed from the input audio signal. The content selectorcomponent can select the second base digital component based on thekeyword identified based on the input audio signal. For example, for theinput audio signal “what is the weather in San Francisco,” the automatedassistant can determine the request is for the current weather of SanFrancisco to be presented. The first base digital component can be animage that includes graphics illustrating the current weather conditionsand temperature in San Francisco. The automated assistant can select thesecond base digital component based on a keyword associated with therequest, such as “San Francisco.” For example, the second base digitalcomponent can be an image that includes information about a popularrestaurant located in San Francisco.

The method 1100 can include determining a distance (1108). The automatedassistant can determine the distance between the automated assistant (orthe projector associated with the automated assistant) and theprojection surface (e.g., a wall) onto which the digital components aregoing to be projected. The automated assistant can determine thedistance using built in range finding sensors, such as ultrasonic orinfrared sensors. The end user, when configuring the automatedassistant, can input the distance to the automated assistant. Thedistance can be determined each time an input audio signal istransmitted to the automated assistant, at predetermined intervals(e.g., daily or weekly), or during a configuration phase of theautomated assistant.

The method 1100 can determining transformation parameters (1110). Theautomated assistant can determine the transformation parameters based atleast on the distance between the client device (or associatedprojector) and the projection surface. The transformation parameters canbe based on a pose or distance between the end user and the projectionsurface. The transformation parameters can correct for a skew in theprojection of digital components onto the projection surface based onthe placement of the projector or the position of the user. For example,application of the transformation parameters can enable the automatedassistant to perform a keystone correction on the digital component.Without the application of the transformation parameters, one or more ofthe edges of the digital component may be non-parallel to one anotherwhen projected onto the projection surface. The transformationparameters can correct for the skew such that the edges of the digitalcomponent are parallel with one another when projected onto theprojection surface. The transformation parameters can include lineartransformations. The transformation parameters can be stored locally atthe client device or projector. For example, the digital components canbe transmitted to the client device, which can apply the transformationparameters to generate the transformed digital component.

The method 1100 can include generating a first transformed digitalcomponent and a second transformed digital component (1112). The firstand second transformed digital components can be transformed images. Thetransformed version of the digital component can include the samecontent as the original digital component. The images of the transformeddigital components can be adjusted or scaled such that the edges of thetransformed images appear parallel to one another when projected ontothe projection surface.

The method 1100 can include transmitting the first transformed digitalcomponent and the second transformed digital component (1114). The firstand second transformed digital components can be transmitted to theclient device to be projected onto the projection surface. In somecases, the transformation parameters and the base digital components canbe transmitted to the client device. The client device can apply thetransformation parameters to the base digital components prior toprojecting the base digital components onto the projection surface.

FIG. 11 is a block diagram of an example computing device 1010 that mayoptionally be utilized to perform one or more aspects of techniquesdescribed herein. In some implementations, one or more of a clientcomputing device, user-controlled resources module, and/or othercomponent(s) may comprise one or more components of the examplecomputing device 1010.

Computing device 1010 typically includes at least one processor 1014which communicates with a number of peripheral devices via bus subsystem1012. These peripheral devices may include a storage subsystem 1024,including, for example, a memory subsystem 1025 and a file storagesubsystem 1026, user interface output devices 1020, user interface inputdevices 1022, and a network interface subsystem 1016. The input andoutput devices allow user interaction with computing device 1010.Network interface subsystem 1016 provides an interface to outsidenetworks and is coupled to corresponding interface devices in othercomputing devices.

User interface input devices 1022 may include a keyboard, pointingdevices such as a mouse, trackball, touchpad, or graphics tablet, ascanner, a touchscreen incorporated into the display, audio inputdevices such as voice recognition systems, microphones, and/or othertypes of input devices. In general, use of the term “input device” isintended to include all possible types of devices and ways to inputinformation into computing device 1010 or onto a communication network.

User interface output devices 1020 may include a display subsystem, aprinter, a fax machine, or non-visual displays such as audio outputdevices. The display subsystem may include a cathode ray tube (CRT), aflat-panel device such as a liquid crystal display (LCD), a projectiondevice, or some other mechanism for creating a visible image. Thedisplay subsystem may also provide non-visual display such as via audiooutput devices. In general, use of the term “output device” is intendedto include all possible types of devices and ways to output informationfrom computing device 1010 to the user or to another machine orcomputing device.

Storage subsystem 1024 stores programming and data constructs thatprovide the functionality of some or all of the modules describedherein. For example, the storage subsystem 1024 may include the logic toperform selected aspects of the process of FIG. 3, as well as toimplement various components depicted in FIGS. 1 and 2.

These software modules are generally executed by processor 1014 alone orin combination with other processors. Memory 1025 used in the storagesubsystem 1024 can include a number of memories including a main randomaccess memory (RAM) 1030 for storage of instructions and data duringprogram execution and a read only memory (ROM) 1032 in which fixedinstructions are stored. A file storage subsystem 1026 can providepersistent storage for program and data files, and may include a harddisk drive, a floppy disk drive along with associated removable media, aCD-ROM drive, an optical drive, or removable media cartridges. Themodules implementing the functionality of certain implementations may bestored by file storage subsystem 1026 in the storage subsystem 1024, orin other machines accessible by the processor(s) 1014.

Bus subsystem 1012 provides a mechanism for letting the variouscomponents and subsystems of computing device 1010 communicate with eachother as intended. Although bus subsystem 1012 is shown schematically asa single bus, alternative implementations of the bus subsystem may usemultiple busses.

Computing device 1010 can be of varying types including a workstation,server, computing cluster, blade server, server farm, or any other dataprocessing system or computing device. Due to the ever-changing natureof computers and networks, the description of computing device 1010depicted in FIG. 11 is intended only as a specific example for purposesof illustrating some implementations. Many other configurations ofcomputing device 1010 are possible having more or fewer components thanthe computing device depicted in FIG. 11.

In situations in which the systems described herein collect or otherwisemonitor personal information about users, or may make use of personaland/or monitored information), the users may be provided with anopportunity to control whether programs or features collect userinformation (e.g., information about a user's social network, socialactions or activities, profession, a user's preferences, or a user'scurrent geographic location), or to control whether and/or how toreceive content from the content server that may be more relevant to theuser. Also, certain data may be treated in one or more ways before it isstored or used, so that personal identifiable information is removed.For example, a user's identity may be treated so that no personalidentifiable information can be determined for the user, or a user'sgeographic location may be generalized where geographic locationinformation is obtained (such as to a city, ZIP code, or state level),so that a particular geographic location of a user cannot be determined.Thus, the user may have control over how information is collected aboutthe user and/or used. For example, in some implementations, users mayopt out of having automated assistant 112 attempt to estimate their agerange and/or vocabulary level.

While several implementations have been described and illustratedherein, a variety of other means and/or structures for performing thefunction and/or obtaining the results and/or one or more of theadvantages described herein may be utilized, and each of such variationsand/or modifications is deemed to be within the scope of theimplementations described herein. More generally, all parameters,dimensions, materials, and configurations described herein are meant tobe exemplary and that the actual parameters, dimensions, materials,and/or configurations will depend upon the specific application orapplications for which the teachings is/are used. Those skilled in theart will recognize, or be able to ascertain using no more than routineexperimentation, many equivalents to the specific implementationsdescribed herein. It is, therefore, to be understood that the foregoingimplementations are presented by way of example only and that, withinthe scope of the appended claims and equivalents thereto,implementations may be practiced otherwise than as specificallydescribed and claimed. Implementations of the present disclosure aredirected to each individual feature, system, article, material, kit,and/or method described herein. In addition, any combination of two ormore such features, systems, articles, materials, kits, and/or methods,if such features, systems, articles, materials, kits, and/or methods arenot mutually inconsistent, is included within the scope of the presentdisclosure.

1-20. (canceled)
 21. A system to generate interfaces in audio-basedenvironments, comprising: a data processing system having one or morecomponents coupled with memory, the data processing system to: receivean input audio signal of a user acquired via a sensor of a clientdevice, the client device communicatively coupled with the clientdevice; parse the input audio signal to identify a request for contentto present via the projector; select, from a plurality of base digitalcomponents, a base digital component using the request identified fromthe input audio signal, the base digital component having a first set ofimage frames; identify a distance between the user and a projectionsurface upon which the projector is to project; determine, based on thedistance between the user and the projection surface, a transformationparameter to set at least one of a rotation, a scaling, a size, or askew of the first set of image frames of the base digital component forprojection onto the projection surface; generate a second set of imageframes based on the first set of image frames of the base digitalcomponent and the transformation parameter; and provide the second setof image frames to the projector to project onto the projection surface.22. The system of claim 21, comprising the data processing system to:determine, concurrent to the projection of at least one of the secondset of image frames, for a change in the distance between the user andthe projection surface is greater than a threshold distance; and update,responsive to the determination, the transformation parameter based onthe change in the distance.
 23. The system of claim 21, comprising thedata processing system to: identify a plurality of distance between acorresponding plurality of users and the projector; and determine, basedon at least a subset of the plurality of distances, the transformationparameter for the first set of image frames of the base digitalcomponent.
 24. The system of claim 21, comprising the data processingsystem to: identify a pose of the user relative to the projectionsurface, the pose including at least one of a position or an orientationof the user; and determine the transformation parameter based on thepose of the user relative to the projection surface.
 25. The system ofclaim 21, comprising the data processing system to: determine that apose of the user relative to the projection surface satisfies athreshold condition; and select, responsive to the determination thatthe pose of the user satisfies the threshold condition, the base digitalcomponent including one or more interactive interface elements.
 26. Thesystem of claim 21, comprising the data processing system to select,from the plurality of base digital components, the base digitalcomponent based on the distance identified between the user and theprojection surface.
 27. The system of claim 21, comprising the dataprocessing system to: parse the input audio signal to identify one ormore keywords; select, from the plurality of base digital components, asecond base digital component using the one or more keywords, the secondbase digital component including a third set of image frames; generate afourth set of image frames based on the third set of image frames of thesecond base digital component and the transformation parameter; andprovide the fourth set of image frames to the projector to project ontothe projection surface with the second set of image frames.
 28. A systemto generate interfaces in audio-based environments, comprising: a clientdevice having one or more processors coupled with memory, the clientdevice communicatively coupled with a projector, the client device to:receive, via a sensor, an input audio signal of a user; transmit, to adata processing system, the input audio signal to cause the dataprocessing system to identify a request for content from parsing theinput audio signal and select a base digital component from a pluralityof base digital components based on the request; receive, from the dataprocessing system, the base digital component including a first set ofimage frames; identify a distance between the user and a projectionsurface upon which the projector is to project; identify atransformation parameter to set at least one of a rotation, a scaling, asize, or a skew of the first set of image frames of the base digitalcomponent for projection onto the projection surface in accordance withthe distance between the user and the projection surface; generate asecond set of image frames based on the first set of image frames of thebase digital component and the transformation parameter; and provide thesecond set of image frames to the projector to project onto theprojection surface.
 29. The system of claim 28, comprising the clientdevice to: determine, concurrent to the projection of at least one ofthe second set of image frames, for a change in the distance between theuser and the projection surface is greater than a threshold distance;and update, responsive to the determination, the transformationparameter based on the change in the distance.
 30. The system of claim28, comprising the client device to: identify a plurality of distancesbetween a corresponding plurality of users and the projector; andidentify, in accordance with at least a subset of the plurality ofdistances, the transformation parameter for the first set of imageframes of the base digital component.
 31. The system of claim 28,comprising the client device to: identify a pose of the user relative tothe projection surface, the pose including at least one of a position oran orientation of the user; and identify the transformation parameterbased on the pose of the user relative to the projection surface. 32.The system of claim 28, comprising the client device to: determine thata pose of the user relative to the projection surface satisfies athreshold condition; and cause the data processing system to select,responsive to the determination that the pose of the user satisfies thethreshold condition, the base digital component including one or moreinteractive interface elements.
 33. The system of claim 28, comprisingthe client device to: determine that a pose of the user relative to theprojection surface satisfies a threshold condition; and cause the dataprocessing system to select, responsive to the determination that thepose of the user satisfies the threshold condition, the base digitalcomponent including one or more interactive interface elements.
 34. Thesystem of claim 28, comprising the client device to: transmit, to thedata processing system, the input audio signal to cause the dataprocessing system to identify one or more keywords from parsing theinput audio signal and select a second base digital component from theplurality of base digital components using the one or more keywords, thesecond base digital component including a third set of image frames;generate a fourth set of image frames based on the third set of imageframes of the second base digital component and the transformationparameter; and provide the fourth set of image frames to the projectorto project onto the projection surface with the second set of imageframes.
 35. A method of generating interfaces in audio-basedenvironments, comprising: receiving, by a data processing system, aninput audio signal of a user acquired via a sensor of a client device,the client device communicatively coupled with the client device;parsing, by the data processing system, the input audio signal toidentify a request for content to present via the projector; selecting,by the data processing system, from a plurality of base digitalcomponents, a base digital component using the request identified fromthe input audio signal, the base digital component having a first set ofimage frames; identifying, by the data processing system, a distancebetween the user and a projection surface upon which the projector is toproject; determining, by the data processing system, based on thedistance between the user and the projection surface, a transformationparameter to adjust at least one of a rotation, a scaling, a size, and askew of the first image frame of the base digital component projectedonto the projection surface; generating, by the data processing system,a second set of image frames based on the first set of image frames ofthe base digital component and the transformation parameter; andproviding, by the data processing system, the second set of image framesto the projector to project onto the projection surface.
 36. The methodof claim 35, comprising: determining, by the data processing system,concurrent to the projection of at least one of the second set of imageframes, for a change in the distance between the user and the projectionsurface is greater than a threshold distance; and updating, by the dataprocessing system, responsive to the determination, the transformationparameter based on the change in the distance.
 37. The method of claim35, comprising: identifying, by the data processing system, a pluralityof distance between a corresponding plurality of users and theprojector; and determining, by the data processing system, based on atleast a subset of the plurality of distances, the transformationparameter for the first set of image frames of the base digitalcomponent.
 38. The method of claim 35, comprising identifying, by thedata processing system, a pose of the user relative to the projectionsurface, the pose including at least one of a position or an orientationof the user; and determining, by the data processing system, thetransformation parameter based on the pose of the user relative to theprojection surface.
 39. The method of claim 35, comprising: determining,by the data processing system, that a pose of the user relative to theprojection surface satisfies a threshold condition; and selecting, bythe data processing system, responsive to the determination that thepose of the user satisfies the threshold condition, the base digitalcomponent including one or more interactive interface elements.
 40. Themethod of claim 35, comprising: parsing, by the data processing system,the input audio signal to identify one or more keywords; and selecting,by the data processing system, from the plurality of base digitalcomponents, a second base digital component using the one or morekeywords, the second base digital component including a third set ofimage frames; generating, by the data processing system, a fourth set ofimage frames based on the third set of image frames of the second basedigital component and the transformation parameter; and providing, bythe data processing system, the fourth set of image frames to theprojector to project onto the projection surface with the second set ofimage frames.